Intrusion Detection using unsupervised learning

نویسندگان

  • Kusum bharti
  • Sanyam Shukla
  • Shweta Jain
چکیده

Clustering is the one of the efficient datamining techniques for intrusion detection. In clustering algorithm kmean clustering is widely used for intrusion detection. Because it gives efficient results incase of huge datasets. But sometime kmean clustering fails to give best result because of class dominance problem and no class problem. So for removing these problems we are proposing two new algorithms for cluster to class assignment. According to our experimental results the proposed algorithm are having high precision and recall for low class instances. KeywordsFeature selection, k-mean clustering, fuzzy k mean clustering, and KDDcup 99 dataset Introduction (Heading 1) Intrusion is the sequence of the set of related activity which perform unauthorized access to the useful information and unauthorized file modification which causes harmful activity. Intrusion detection system deal with supervising the incidents happening in computer system or network environments and examining them for signs of possible events, which are infringement or imminent threats to computer security, or standard security practices. Various techniques have been used for intrusion detection. Datamining is one of the efficient techniques for intrusion detection. Datamining uses two learning, supervised learning and unsupervised learning. Clustering is unsupervised learning which characterize the datasets into subparts based on observation. Datapoint which belong to the clusters same clusters share common property. To find similarity between data points distance measure are used. In many papers Euclidean distance measure is used for deciding the similarity between the datapoints. This paper is organized as follow: Section I give some over view of related works, section II gives basic concept of kmean clustering, the section III presents the architecture of the proposed model. Section IV summarizes the obtained results with comparison and discussions. Section V concludes the paper along with future works. I. RELATED WORKS First, Authors [1-3] have used k-mean clustering for intrusion detection. The performance of k-mean clustering affected initial cluster center and number of cluster centroid. Zhang Chen et.al[4] has proposed a new concept for selecting the number of clusters. According author [4] the number of Initial cluster for a datasets is and after that combine or divide the sub cluster based on the defined measures. Mark Junjie Li troids et al. [5] has proposed an extension to the standard fuzzy K-Means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the initial cluster centers Which make clustering to insensitive to initial cluster center. Mrutyunjaya Panda et.al [6] has used k-mean and fuzzy k-mean for intrusion detection. Sometimes k-mean clustering does not gives best results for large datasets. So for removing this problem Yu Guan et. al. [7] have introduced a new method Ymean which is variation of k-mean clustering it removes the dependency and degeneracy problem of k-mean clustering. Sometime single clustering algorithm doesnot gives best result so for removing this problem , Fangfei Weng et.al.[8] has used kmean clustering with new concepts which is called Ensemble K-mean clustering. Cuixiao Zhang et.al [9] have used KD clustering for intrusion detection. Some of the authors have used k-mean clustering along with the other method for improving the detection rate of intrusion detection system. Authors [9-13] have used k mean clustering along with the other datamining techniques for intrusion detection. Authors [14] have used ANN along with the fuzzy k-mean clustering for intrusion detection which removes the problem related to the ANN. All of these techniques improve the detection rate for intrusion detection but no able to solve the class dominance problem of k-mean clustering So for removing this problem we are proposing two new algorithm which removes the class dominance problem along with the no class problem. In class dominance problem low instance classes (i.e. R2L and U2R) are dominated by high instances classes. In no class problem some of the clusters are assigned to no class. II. EXISTING TECHNIQUES K-mean clustering is a unsupervised machine learning techniques [17], It was first proposed by James MacQueen in 1967. Algorithm Input: Datasets to be clustered which contains N number of instances, k=number of clusters needed, randomly select k centroids from datasets. Outputs: datasets in form of k clusters which have achieved the convergence criteria. Step1 (Initialization): First of initialize k number of clusters along with k number of centroids. ISSN : 0975-3397 1865 Kusum bharti et. al. / (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 05, 2010, 1865-1870 Step 2 (Assignment): Assign each datapoints to the corresponding cluster based upon the distance measures (Mostly Euclidean distance is used [18]). Where p and q are two points in Euclidean distance. Step3 (Recalculation): After assign each datapoints to the corresponding clusters recalculate the centroid of the cluster (mean of the clusters). Step4 (Repeat): Repeat steps 2 and 3 until convergence criteria are not met An [19] algorithm for partitioning (or clustering) N data points into k disjoint subsets Sj containing Nj data points so as to minimize the sum-of-squares criterion Where xn is a vector representing the n data point and μj is the geometric centroid of the data points in Sj. Along with considering the minimizing the sum of square criterion two more criteria is also considered. Inter and Intra cluster distance. A. Existing class to cluster mapping which is used in weka Cluster to class mapping, No class, and class dominance is a key problem in k-mean clustering. Machine learning tool WEKA [20] uses number of instances to assign a cluster to a particular class. The algorithm used by weka for cluster to class mapping is as follows Weka_Cluster_Class_Mapping_Algorithm: Step1. Class-wise analysis: Search the cluster, for each class which contains majority of instances of that class. After this step for each class we know the cluster number which contains maximum number of instances of that class. Step 2. Cluster-wise analysis: In this step we analyze each cluster on the basis of results obtained in previous step. a. If a cluster contains maximum number of instances of only a particular class then the cluster is assigned to that class. b. If a cluster contains maximum number of instances of more than one class then cluster is assigned to the class with greater number of instances. c. The cluster which does not contain maximum number of instances for any class is assigned to no class. In the above approach not more than one cluster can be assigned to a single class. This approach has class dominance and no class problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Alert correlation and prediction using data mining and HMM

Intrusion Detection Systems (IDSs) are security tools widely used in computer networks. While they seem to be promising technologies, they pose some serious drawbacks: When utilized in large and high traffic networks, IDSs generate high volumes of low-level alerts which are hardly manageable. Accordingly, there emerged a recent track of security research, focused on alert correlation, which ext...

متن کامل

Learning Intrusion Detection: Supervised or Unsupervised?

Application and development of specialized machine learning techniques is gaining increasing attention in the intrusion detection community. A variety of learning techniques proposed for different intrusion detection problems can be roughly classified into two broad categories: supervised (classification) and unsupervised (anomaly detection and clustering). In this contribution we develop an ex...

متن کامل

Clustering-based Network Intrusion Detection

Recently data mining methods have gained importance in addressing network security issues, including network intrusion detection—a challenging task in network security. Intrusion detection systems aim to identify attacks with a high detection rate and a low false alarm rate. Classification-based data mining models for intrusion detection are often ineffective in dealing with dynamic changes in ...

متن کامل

Improving Self Organizing Map Performance for Network Intrusion Detection

The continuous evolution of the types of attacks against computer networks suggests a paradigmatic shift from misuse based intrusion detection system to anomaly based systems. Unsupervised learning algorithms are natural candidates for this task, but while they have been successfully applied in host-based intrusion detection, network-based applications are more difficult, for a variety of reaso...

متن کامل

Improving Detection of Wi-Fi Impersonation by Fully Unsupervised Deep Learning

Intrusion Detection System (IDS) has been becoming a vital measure in any networks, especially Wi-Fi networks. Wi-Fi networks growth is undeniable due to a huge amount of tiny devices connected via Wi-Fi networks. Regrettably, adversaries may take advantage by launching an impersonation attack, a common wireless network attack. Any IDS usually depends on classification capabilities of machine l...

متن کامل

Analyzing TCP Traffic Patterns Using Self Organizing Maps

The continuous evolution of the attacks against computer networks has given renewed strength to research on anomaly based Intrusion Detection Systems, capable of automatically detecting anomalous deviations in the behavior of a computer system. While data mining and learning techniques have been successfully applied in host-based intrusion detection, network-based applications are more difficul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010